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ABSTRACT 

The luminosity functions of galaxies and quasars provide invaluable information about 
galaxy and quasar formation. Estimating the luminosity function from magnitude lim- 
ited samples is relatively straightforward, provided that the distances to the objects 
in the sample are known accurately; techniques for doing this have been available for 
about thirty years. However, distances are usually known accurately for only a small 
subset of the sample. This is true of the objects in the Sloan Digital Sky Survey, 
and will be increasingly true of the next generation of deep multi-color photometric 
surveys. Estimating the luminosity function when distances are only known approxi- 
mately (e.g., photometric redshifts are available, but spectroscopic redshifts are not) 
is more difficult. I describe two algorithms which can handle this complication: one 
is a generalization of the Vmax algorithm, and the other is a maximum likelihood 
approach. Because these methods account for uncertainties in the distance estimate, 
they impact a broader range of studies. For example, they are useful for studying the 
abundances of galaxies which are sufficiently nearby that the contribution of peculiar 
velocity to the spectroscopic redshift is not negligible, so only a noisy estimate of the 
true distance is available. In this respect, peculiar velocities and photometric redshift 
errors have similar effects. The methods developed here are also useful for estimating 
the stellar luminosity function in samples where accurate parallax distances are not 
available. 

Key words: methods: analytical - galaxies: formation - galaxies: haloes - dark matter 
- large scale structure of the universe 



1 INTRODUCTION 

Estimates of the distribution of distances to galaxies, and 
of the galaxy luminosity function and its evolution, provide 
useful constraints on models of galaxy formation. Current 
(e.g. the SDSS, York et al. 2000, Combo-17, Wolf et al. 2003, 
MUSYC, Marchesini et al. 2007) and planned surveys (e.g., 
DES, LSST) go considerably deeper in multicolor photom- 
etry than in spectroscopy, or are entirely photometric. For 
such surveys, reasonably accurate photometric redshift es- 
timates (e.g. Hyper-z, Bolzonella et al. 2000, and ANNz, 
Collister & Lahav 2003) can or will be made. In the case 
of Luminous Red Galaxies (e.g. Eisenstein et al. 2001), the 
photometric redshifts may actually be quite accurate (e.g. 
Padmanabhan et al. 2004; Weinstein et al. 2004; Collister 
et al. 2006). The number of objects with photometric red- 
shifts typically exceeds the number for which spectroscopic 
redshifts are available by more than an order of magni- 
tude. This is also true of new quasar detection algorithms. 
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Whereas the SDSS will obtain spectra of about one hundred 
thousand quasars, the Non-parametric Bayesian Classifica- 
tion algorithm of Richards et al. (2004) has identified one 
million quasars using SDSS photometry. Large photometric 
samples of galaxies and quasars offer the potential of study- 
ing cosmological evolution at a fraction of the cost of a full 
spectroscopic survey. 

Bigger is not better only for studying the evolution of 
the galaxy and quasar populations. In the case of galax- 
ies, the larger number of LRGs with photometric redshifts, 
allowed new science: the detection of the integrated Sachs- 
Wolfe effect (Fosalba et al. 2003; Scranton et al. 2003; Pad- 
manabhan et al. 2005; Cabre et al. 2006) required the larger 
photometric LRG catalog. In the case of quasars also, larger 
sample sizes allow one to address new science questions. 
For example, the SDSS spectroscopic sample is barely large 
enough to measure the gravitational lensing magnification 
bias signal with high statistical significance: the larger pho- 
tometric sample made the measurement possible (Scranton 
et al. 2005). 

With photometric redshift surveys becoming the norm, 
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it is timely to devise methods for estimating the distribution 
of comoving distances and the evolution of the luminosity 
function in such samples. Broadly speaking, techniques for 
estimating the luminosity function from a magnitude limited 
catalog fall into two classes: one is based on the nonpara- 
metric V ma x method outlined by Schmidt (1968); the other 
is a maximum likelihood analysis which can provide para- 
metric or nonparametric estimates of the luminosity func- 
tion (Sandage, Tammann & Yahil 1979; Efstathiou, Ellis 
& Peterson 1988; Springel & White 1998). Both methods 
assume that the distances are known precisely and accu- 
rately. The main goal of the present work is to generalize 
both types of methods to handle photometric redshifts. For 
reasons described below, the analysis which follows is best- 
suited to studying objects where evolution and fe-correction 
uncertainties are small. In practice, this means they are best 
suited to catalogs which contain objects of one spectral type. 
Removing this constraint is the subject of ongoing work. 

Section [2] discusses a deconvolution algorithm for esti- 
mating dN/ dz and the luminosity function from photometric 
redshift samples. The estimator of the luminosity function 
is a generalization of the Kn ax method (Schmidt 1968), and 
the method uses the deconvolution algorithm described by 
Lucy (1974). Section [3] discusses a maximum likelihood ap- 
proach. Some applications are discussed in Section [4] and a 
final section summarizes. 
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The term in square brackets shows how the estimated dis- 
tribution dN e differs from the true one dN. In particular, 
it shows that an accurate estimate of dN can be obtained 
by summing over all objects that have estimated redshift z e , 
weighting each by the inverse of the term in square brackets 
in the expression above. 

The general problem is to infer the shape of the in- 
trinsic distribution dN/dz given the measured distribution 
dN e /dZe, even if p(z e \z) is not sharply peaked. If p(z e \z) is 
known, and dN e /dz e is measured, then this is an integral 
equation of the first kind, which can be solved to obtain 
the intrinsic dN/dz. This is possible even if p(z e \z) is fairly 
broad. Padmanabhan et al. (2004) describe a method to do 
this, but, for reasons made explicit in Lucy (1974), their 
method is not ideal. Before we describe our method, the 
following section shows that estimating the intrinsic lumi- 
nosity function from photometric redshift data is a similar 
deconvolution problem. 



2 THE Vmax METHOD 

I first outline why the problems of estimating dN/dz and 
4>(L) are both best thought of as deconvolution problems. I 
then show that Lucy's deconvolution algorithm provides an 
efficient way of performing the deconvolution. 



2.1 The redshift distribution: dN/dz 

Let dN/ dz denote the number of objects which lie at redshift 
z (since peculiar velocities are unlikely to be larger than a 
few thousand km/s, they do not make a significant change to 
the redshift if z > 0.01). Let p(z e \z) denote the probability 
of estimating the redshift as z e when the true value is z. 
Then the distribution of estimated redshifts is 



dN e (z e ) 

dz e 



dz p(z e \z) 

dz 



(1) 



To get an idea of what this implies, suppose that p(z e \z) 
is sharply peaked around the true value z. Then define 
Az = z e — z and expand dN/dz in a Taylor series around its 
value at z e . This yields an expansion in Az. If the estimated 
redshift is unbiased in the mean, then {Az) = and the 
leading order contribution is of the form 



dN e (z e ) dN(z e ) (Az 2 ) d 2 [dN(z)/dz] 



dz e 



dz e 



+ 



dz 2 
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Typically, dN/dz is well approximated by a constant times 
z 2 exp[— (z/zm.)"], with a « 3/2 and z m set by the luminos- 
ity function and the limiting magnitude of the catalog (i.e., 
dN/dz a z 2 at z < z m , and it drops rapidly for z 3> z m ). 
In this case, 



dN e (z e ) _ dN(z e 



dz e 



dz e 



1 + 



(A, 2 ) 



C(z e 



(3) 



2.2 The luminosity distribution: <f>(L) 

Let 4>{M\z) denote the number density of galaxies with ab- 
solute magnitudes M tx — 2.51og 10 L, where L = £4ivD 2 j (z) 
is the luminosity, I is the apparent brightness, and Dl(z) is 
the luminosity distance at z. Assume for the moment that 
there is no evolution (extending the analysis to include evo- 
lution is the subject of work in progress). Then <f)(M\z) is 
independent of z. 

Simply adding up the total number of galaxies in a mag- 
nitude limited catalog which have luminosity L and dividing 
by the total volume of the survey is not a good estimator 
of 4>(L) itself. This is because the more luminous objects 
will be visible to larger distances. Let Vmax(M) denote the 
largest comoving volume out to which an object of abso- 
lute magnitude M can be seen. If the catalog is limited at 
both ends, then there is a minimum volume below which 
the object would have been too bright to be included in the 
catalog: call this Vmin(M). The number of galaxies with ab- 
solute magnitude M in a catalog magnitude limited at both 
ends is 



N(M)=<p(M) \V W (M)-V X 



ata(M)]. 



(•») 



Therefore, if we sum over all the galaxies in a magnitude 
limited catalog, and we weight each object by the inverse of 
Vmax(M) — Vmin(M), then we will actually have estimated 
the luminosity function. This is the basis of the 1/Knax 
method (Schmidt 1968). 

If the estimate z e of the true redshift z comes with a 
large uncertainty, this translates into an uncertainty in the 
luminosity (this assumes that the error in redshift determi- 
nation does not affect the observed apparent magnitude). 
The total number of objects with estimated absolute mag- 
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nitudes M e is 

N e (M e ) = dm dD L n(m,D L )p(m~ M e \D h ,m) 

= / dZ?L ^7p/ dM 4>(M) 

J O^L iA/ min (D L ) 

x p(M - M e \D L ,M) 
= fdMcKM) f W dft ^f) 

i ^D L (JVI min ) d -°L 

x p(M-M e \D L ,M), (6) 

where we have used the fact that 5 log 10 D e = m — 
M e = 5 log 10 £> L + M - M e , so p(D e |_D L , to) dD e = p(M - 
M e |-Dl, to) dM e . Note that if there is no error in the dis- 
tance, then p(M — Me) is a delta function centered on M, 
and this expression reduces to equation ((5j). 

If V(V m ax, V m in, M) denotes result of performing the in- 
tegral over Dl in the final expression above, then 

iVe (M e ) = j AM 4>{M) V(T/ max , Knin, M). (7) 

Since Vinax and Vmi-n are known functions of M, V itself is 
really just a complicated function of M. To get some feel for 
its form, suppose that the error in determining the redshift 
does not depend on apparent magnitude, and, in addition, 
the error distribution is a function of the ratio D e /D^ only. 
Then p(M — M e \Di,, M) does not depend explicitly on Dl 
itself, so it can be taken out of the integral over Dl- In this 
case, 

V = [^ m ax(M) - V min (Af )] p{M e \M), (8) 

Now, <p(M) times the term in square brackets is the intrinsic 
N(M) distribution (equation [SJ , so equation (J7J) becomes 

N e (M e )= J dM N(M)p{M e \M). (9) 

In this case, the observed distribution of M e is the convo- 
lution, not just of the luminosity function cf> with the error 
distribution p, but of the product of <f> and (Vinax — Knin) 
with p. The inclusion of this second term accounts for the 
fact that more objects are likely to scatter down from large 
z to small than the other way around, simply because there 
is a greater volume at larger z. The form of the expression 
above shows clearly that one generically expects distance 
errors to scatter objects from the peak of the N(M) distri- 
bution to the tails. Unless it is corrected for, this will lead 
one to overestimate the number density of low and high lu- 
minosity objects relative to the mean. 

When the distances are known accurately, one can sim- 
ply use (j> — N/V as a non-parametric estimate of the lumi- 
nosity function. However, the expression above shows that 
the relation between N e (M e ) and <j) is more complicated 
than when the distance is not known precisely: determining 
4> requires solution of an integral equation. In this respect, 
the problem is similar to that of determining dN/dz when 
dN e /dz e and p(z e \z) are known. Once again, in the case of 
small errors, one can expand the integrand in a power series 
and then perform the integral to determine the correction 
factor C(M e ) that is required if one wishes to weight galax- 
ies by 1/[(1 + C)V] and so estimate <f> from the number of 
observed M e . But the general case is more complicated. 



Before moving on to the solution, note that the assump- 
tion that p(M — M e |-DL,M) does not depend explicitly on 
Dl itself, is not crucial. I have mainly made the assump- 
tion here so that the form of the argument is clear. If it 
does depend on Dl, then the weighting factor in the in- 
tegrand is a more complicated function of M than simply 
N(M)p(M e \M). 

2.3 Non-parametric deconvolution and the V m ax 
method 

Since N e (M e ), p(M e |M) and dVcom/dz are all known, the 
relation to be solved for 4>{M) is an integral equation of the 
first kind. Standard arguments show that it can be written 
as a matrix equation which can then be solved for (j>(M). 
The problem with this approach is how one accounts for the 
fact that the measured N e (M e ) distribution may be noisy. 
In particular, since N e (M e ) is likely to be smoother than 
N(M), if iVe contains sharp features, then the recovered N 
will contain sharper features. If sharp features are expected 
to be unrealistic, and the measurement is noisy (this will al- 
ways be true in the tails), then an exact inversion of the in- 
tegral equation is clearly undesirable. An iterative algorithm 
which avoids this problem was proposed by Lucy (1974); it 
converges rapidly and is simple to code (~ 20 lines of code), 
so it is the method of choice. 

Figure [T] shows how well this method works on mock 
data. Mock galaxies were distributed in redshift as indicated 
by the filled circles in the right-hand panels of Figure [1] 
Estimated redshifts were assigned as shown in the bottom 
left panel (the particular choice of p(z e \z) will be discussed 
shortly). Top left panel compares the estimated and true 
redshifts. The histograms in the panels on the right show the 
distribution of estimated redshifts. Note how different they 
are from the true distribution: although dN/dz has a single 
well-defined peak, dN e /dz e is almost bimodal. Our choice of 
p(z e \z) was chosen to produce just this effect: it mimics the 
effect on some photometric redshift estimators as, e.g., the 
4000A break passes from one filter to another. The problem 
is to use the estimated histogram and the known shape of 
p(z e \z) to infer that the true intrinsic distribution traces the 
locus defined by the filled circles. The histogram was used 
as the starting guess for the deconvolution algorithm, after 
which the algorithm converged rapidly to the filled circles 
(four iterations are shown; they overlap one another closely). 
Figure [5] provides a more detailed comparison of how well 
the recovered distribution resembles the true one, and how 
different the photo-z distribution, which was used as the 
starting guess, is from the true distribution. 

Figure [3] shows results for the luminosity function. The 
panel on the left shows the intrinsic N(M) (solid circles) and 
estimated N e (M) (open circles) distributions in a mock cat- 
alog generated assuming the same flat cosmological model 
as before, but with the intrinsic distribution of luminosities 
and the apparent magnitude limits chosen to be those of 
the galaxies in the SDSS survey (Blanton et al. 2003). The 
estimated redshifts were assumed to follow p(D e \D)dD e = 
(dx / x)( , yx) 1 exp(— 7a;)/F(7), where x = D e /D and 7 = 5. 
This distribution has (a;) = 1, and — I/7. With 7 = 5, 
this error distribution is substantially worse than typical 
photometric redshift errors. Notice how N(M e ) is broader 
than the true distribution: it has noticably more objects in 
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Figure 1. Example of the difference between the intrinsic redshift distribution (filled circles with error bars in panels on right) and 
the photometric redshift distribution (histograms in panels on right). Panels on the left compare the intrinsic and estimated redshifts. 
Jagged lines in panels on right show how successive iterations converge rapidly to the intrinsic distribution: the histogram was used as 
the starting guess. 
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Figure 2. Comparison of true intrinsic redshift distribution and 
that recovered by the algorithm described in the text. Symbols 
with error bars show one realization of the intrinsic distribution 
(the difference from unity is 'shot-noise' due to the finite size of 
the sample). Histogram shows the associated photo- ,z distribu- 
tion, and jagged curve shows the recovered distribution after four 
iterations: the histogram was used as the starting guess. 



the tails, and hence fewer near the peak. This is the generic 
effect we mentioned earlier. 

The open circles in the panel on the right show the 
result of converting from N e (M) to 4>(M) using Schmidt's 
method with no correction for the photometric redshift error 
distribution. This estimate has more luminous galaxies, and 
a steeper faint-end slope, than the true distribution shown 
by the solid circles. For photo- z error distributions which are 
approximately symmetric, this sort of discrepancy is generic. 

The solid lines in the panel on the left show succes- 
sive iterations of the deconvolution algorithm, starting from 
the open circles. Convergence to the correct distribution is 
clearly seen. The solid line in the panel on the left shows 
the result of applying Schmidt's method to the estimate of 
N(M) returned by the final iteration shown. It is an excel- 
lent approximation to the intrinsic distribution. 



Photo-z redshift and luminosity distributions 5 




Figure 3. Reconstruction of the intrinsic N(M) distribution (filled circles) from the distribution of estimated redshifts when distance 
uncertainties are large (open circles). Error bars on the filled circles assume Poisson statistics. Different curves show how successive 
iterations of the deconvolution algorithm approximate the intrinsic distribution increasingly well: the open circles were used as the 
starting guess, and curves show results after iterations f , 8, 15, and 22. Panel on the left shows the observed distribution, and panel on 
the right shows the associated estimate of the luminosity function. Dashed curve shows the input luminosity function. The generic effect 
of photo-z errors, which the deconvolution algorithm rectifies (compare solid line with filled circles), is to enhance the large luminosity 
tail, and steepen the faint end slope (compare open with filled circles). 



To illustrate that the generic effect of distance errors 
is to scatter objects from the peak of N(M e ) into the tails, 
thus increasing the expected number of high luminosity ob- 
jects and increasing the slope at small luminosities, Figure[5] 
shows a similar calculation, but now when the true intrinsic 
luminosity function is a Gaussian in absolute magnitude. 
The precise parameter values were chosen to match those 
of early-type galaxies in the SDSS, and, once again, I have 
assumed photo-z error distributions (Gaussian with 1 mag 
rms) that are significantly broader than most photo-z algo- 
rithms return. Notice again how the algorithm rapidly con- 
verges from the observed counts (open circles) to the true 
ones (filled circles). 



the) likelihood function 
C(a) — Y\ Pi > wnere 



<t>{Li 



4>(Lj\z x , a) 
S(zi, a) 



(10) 



(Sandage, Tammann & Yahil 1979; Efstathiou, Ellis & Pe- 
terson 1988). Here Zi denotes the redshift of galaxy i, 
<f)(L\z, a) is the luminosity function at z, with shape specified 
by the parameters a, and L m i n (z) is the minimum luminos- 
ity which a galaxy at z must have to be observed in the 
flux limited catalog. That is to say, the parameters a which 
specify the luminosity function are those for which 

<91n£ 9 In pi 



Ou 



do 



o. 



(ii) 



3 THE MAXIMUM-LIKELIHOOD METHOD 

In magnitude limited samples, an unbiased estimate of the 
luminosity function is obtained by maximizing the (log of 



Note that our notation allows the model luminosity function 
to have a parametric form, in which case a denotes the free 
parameters of the model, or to be non-parametric, in which 
case the luminosity function is represented as a sum over 
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bins in luminosity, and a denotes the parameters necessary 
to specify the bin shapes — the most popular shapes being 
tophats, or Gaussians, or concave polynomials with compact 
support. 

If the redshift z is not known precisely, and if the in- 
accuracy in redshift does not affect the observed apparent 
magnitude, then the method should be modified as follows. 
Let Li and z% denote the true luminosity and redshift of 
galaxy i, which together determine £i, the observed appar- 
ent brightness of the object. If Q denotes the estimated red- 
shift, then this, with £i, determines the estimated luminosity 
which we will denote Ai. 

The number of objects in a flux-limited catalog with es- 
timated values C an d A depends on the true intrinsic distri- 
bution of L and z, and on the distribution of redshift errors. 
Since errors in redshift do not alter the observed apparent 
brightness, the number distribution of estimated redshifts is 
expected to be 



N(C,a) = 



4>(±nDt{z)l\a)p{C,\z,t), (12) 



if the intrinsic distribution is parametrized by a. Here 
p(£\z,£) represents the distribution of estimated redshifts 
£ given true z and I. Similarly, the joint distribution of es- 
timated A and C, is 



XN(X,(,a) = 



, dVcom . „2 

dz ~ 



4nDt{z)£ 



dz 

x 4>U-KDUz)t\a) p(C\z,£). 



(13) 



Notice that if the redshift-error distribution is independent 
of £, then 



N((,a) = dzdV com /dzS(z,a)p((\z) 



dzN(z,a)p((\z) 



(14) 



this is just the convolution of the intrinsic redshift distribu- 
tion (in a flux-limited catalog) with the redshift-error distri- 
bution. 

By analogy to when the distances are known accurately, 
the likelihood to be maximized is £ = ]~[. pi, where p; is the 
fraction of the number of objects expected to have estimated 
redshifts Q which also have estimated luminosity A;: 



£(a) = Y\ Pi, where 



Pi 



iV(Ai,Ci,q) 



(15) 



This expression for pi differs from that in the literature 
(Chen et al. 2003 is missing the factors of dV com in the in- 
tegrals which define the numerator and denominator). 

To check that this expression is indeed the correct one, 
note that maximizing the (log of the) likelihood requires 
evaluation of J^. dlnpi/da. This reduces to taking the dif- 
ference of two terms, the first of which is 



£ 



ainJVe(Ai,C<,a) 
da 



dC / dA 



JV t (A,C) dN e (X,C, 



iV e (A,C, 



da 



where we have written the sum over objects as an integral 
over their estimated redshifts and luminosities. Similarly, 



the second term is 

dhxN e (d,a) 



do 



N t (Q dN e ((,a) 
N e (C,a) da 



Maximizing the likelihood means that we vary a until both 
these expressions are equal. 

Suppose that the true distribution would produce 
7Vt(A,£), and that this true distribution is well described 
by a particular choice of the parameters, say at- Then the 
question is, are the two expressions equal when a = at? If 
not, our definition of pi is incorrect, because the minimum 
will occur at some other value of a. To see that it is the 
correct choice, note that when AT e (A, C, at) = Nt(X,Q, then 
the first expression becomes 



dC / dA 



dN e (\,(,a t ) 



do 



= dC 



dN e (C,a t ) 
da 



And because 

2V.(C,a t ) = J dXN e (\,C,a t ) = J d\N t (X, t) = jV t (C), 

the second expression also reduces to f d£ cW e (C, a) /da. 
Thus, both the sums over i reduce to the same quantity. 
Hence, maximizing the expression for the likelihood given 
above (equation 1 15|) does indeed yield an accurate unbiased 
estimate of the luminosity function. This also demonstrates 
that omission of the dKom terms present in our expression 
for pi would lead to a biased estimate of the shape of the 
luminosity function. 



4 SOME APPLICATIONS 

4.1 Galaxies and QSOs: dN/dz and <j>(L) 

The methods above allow one to reconstruct the intrinsic 
dN/dz and 4>{L) distributions of, e.g., QSOs, LRGs and 
other galaxy distributions in, e.g., the SDSS. These will be 
useful for a number of clustering analyses, as well as for 
studying galaxy evolution. As a proof of concept, Figure [3] 
shows the result of running the dN/dz deconvolution algo- 
rithm on publically available data. The input QSO catalog 
is from application of the Non-parametric Bayesian Classi- 
fication algorithm to the SDSS DR1: this produced a cat- 
alog of about 100,000 objects (Richards et al. 2004). For 
each object, photometric redshifts were determined follow- 
ing Weinstein et al. (2004). About 22,000 of these objects 
have spectra from which a spectroscopic redshift estimate 
is available. For this subset of objects, the top left panel 
compares z p hot and z spcc . The other panels show that the 
distribution of p(z p hot — z apec \z spec ) is rather complex. The 
panels on the right show the differences between the true- 
(filled circles) and photo-z distributions (histograms), and 
that the deconvolution algorithm (curve) does a reasonable 
job reconstructing the former from the latter. 



4.2 Peculiar velocities 

In peculiar velocity surveys such as SFI, ENEAR, EFAR 
and 6dF, the distance indicator (the Tully-Fisher, D n —a, or 
Fundamental Plane relations) is noisy: typically this noise 
is approximately twenty percent of the distance, or about 
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Figure 4. Left: Distribution of spectroscopic and photometric redshifts in the SDSS DR1 NBC QSO catalog. Right: Reconstruction 
of the intrinsic dN/dz distribution (filled circles) from that of the photometric redshifts (histogram) using the deconvolution algorithm 
described previously (curve). The reconstruction is quite accurate, despite the complicated nature of the distance errors. 
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Figure 5. Same as Figure [3] but now the underlying luminos- 
ity function is Lognormal (Gaussian in absolute magnitude), and 
the errors in distance are also assumed to be Lognormal. Pa- 
rameters were chosen to mimic early-type galaxies in the SDSS 
(from Bernardi et al. 2003), and distance errors were chosen to 
be about a factor of two larger than in typical peculiar velocity 
surveys. Notice how the raw photo-z estimate of N(M) (open cir- 
cles in panel on left) is broader than the true distribution (filled 
circles), making the estimated luminosity function have a slight 
excess of luminous galaxies, and a significantly larger excess of 
faint galaxies (open circles in panel on right). Nevertheless, when 
started from the estimated distribution, our deconvolution algo- 
rithm quickly converges to the true distribution. 



0.4 mags. If uncorrected for, a generic effect of distance un- 
certainties is to innate the estimated number of low (and 
high) luminosity galaxies. This is illustrated in Figure [5] 
where the error has been set to 1 mag so that the effect is 
more clearly seen. Since the faint end of the luminosity func- 
tion provides a strong constraint on galaxy formation mod- 
els, it is important that it be measured accurately. Therefore, 
it may be interesting to apply our methods to data from pe- 
culiar velocity surveys. In particular, such methods may be 
necessary for estimating unbiased luminosity functions from 
HIPASS and ALFALFA. 



4.3 The stellar luminosity function 

Distances to stars are sometimes estimated by the method of 
photometric parallax: essentially, this method uses the offset 
from a color magnitude-relation to infer a distance. Because 
the color-magnitude relation almost certainly has intrinsic 
scatter (current estimates are about 0.5 mags), the associ- 
ated distance estimate is noisy: this is entirely analogous to 
the noise in distance estimates from peculiar velocity sur- 
veys. Determination of the stellar luminosity function is an 
important ingredient in understanding the IMF. Most cur- 
rent determinations are based on the method of Stobie et al. 
(1989) which assumes small errors in the distance estimate, 
and requires prior knowledge of the shape of the luminos- 
ity function. Since our non-parametric methods are accurate 
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Figure 6. Effect of distance errors on the estimated correla- 
tion between size and luminosity for early- type galaxies. Steeper 
dashed line shows the true (<r\L) relation, and shallower line shows 
a least squares fit to the dots which were obtained by using the 
photo- z estimate of the distance to compute the sizes and abso- 
lute magnitudes. 



even when the noise on the distance estimate is large, it may 
be interesting to apply our methods to this problem as well. 



4.4 Correlations between observables 

So far, I have mainly discussed how to make accurate esti- 
mates of the true redshift and luminosity distributions when 
only noisy distance estimates are available. However, noisy 
distance estimates have another important effect for which it 
is possible to correct. Namely, galaxy observables are known 
to correlate with one-another: the most well-known of these 
is the correlation between luminosity L and circular velocity 
V c (the Tully-Fisher relation for spirals), or velocity disper- 
sion a (the Faber- Jackson relation for ellipticals), but phys- 
ical size R and surface brightness I are also well-correlated 
(the Kormendy relation for ellipticals) , and L also correlates 
with size and color for both spirals and ellipticals. Every 
one of these correlations has been used to constrain galaxy 
formation models, and, since one measures apparent mag- 
nitudes and angular sizes rather than absolute magnitudes 
and physical sizes, every one of these relations includes at 
least one distance-dependent quantity. 

Noise in the distance estimate will lead to biased esti- 
mates of these correlations. To illustrate, Figure [6] shows the 
correlation between luminosity and size in a catalog which 
is constructed to mimic the SDSS early-type galaxy sample 
(Bernardi et al. 2003). The steeper dashed curve shows the 
true {R\L) relation. The dots show the result of assuming 
that Zphot = Zspcc + gaussian with rms 0.03 (this amount of 
scatter in the distance estimate is realistic), and then recom- 
puting the absolute magnitude and size using z p hot instead 



of Zspcc- The shallower dashed line shows (R p hot\L p hot)'- the 
change in slope is dramatic. 

The qualitative nature of the effect is easy to under- 
stand. Distance errors scatter objects towards the bright and 
faint luminosity tails. This increases the spread along the ab- 
solute magnitude axis. If this were the only effect, then one 
might expect the R — L relation to be shallower. However, 
the distance error causes a correlated change to the size: as- 
suming an object is closer than it really is makes one infer 
a smaller luminosity and size than it really has. So the net 
motion of each point is left-and-down, or right-and-up. If 
these motions were parallel to the principal axis of the true 
relation, the net effect would only be to change the scatter 
of the relation. In this case, they are not, so small distance 
errors have a non-negligible effect. 

This bias is simpler to correct-for when only one of the 
variables is distance-dependent. For instance, in the case of 
the L-color relation, it is only L which is affected by the 
distance error (this is not quite true, because fe-corrections 
depend on wavelength — I am mainly using this to illustrate 
an argument). This suggests that if the distance indicator is 
unbiased in the mean, then the mean L as a function color 
can be estimated directly. (The scatter around this mean 
relation is interesting in its own right: it will, of course, 
be affected by the noise in the distance estimate.) In prac- 
tice, however, even this case is not entirely straightforward, 
because galaxy catalogs are almost always magnitude lim- 
ited, and this introduces selection effects into the estimate 
of (I/|color); absent distance errors, it is (color|L) rather 
than (L|color) which can be estimated free of selection ef- 
fects! When accurate distances are known, these selection 
effects can be accounted for by using the quantity Vmsx 
which played an important role in our discussion of the lu- 
minosity function. This suggests that the methods discussed 
previously should allow one to estimate such correlations in 
photometric galaxy catalogs. When the distance error ap- 
pears in both variables, it is slightly harder to correct, but 
a correction is still possible. Essentially, one simply needs to 
write the expressions given previously in matrix rather than 
scalar notation. Making this generalization correctly is the 
subject of work in progress. 



5 DISCUSSION 

I presented two algorithms for estimating the intrinsic red- 
shift and luminosity distributions from photo-z surveys. 
These algorithms improve on previous work by Subbarao 
et al. (1996) and Chen et al. (2003). Subbarao et al. con- 
cluded that numerical simulations were necessary to derive 
accurate estimates — my analysis shows that simulations can 
be avoided. Chen et al. wrote down a maximum likelihood 
expression which they then maximized — I find a different 
expression for the likelihood, and provide an analytic cal- 
culation which shows that maximizing this expression does 
indeed lead to an unbiased estimate; maximizing their ex- 
pression instead would return a biased answer. 

The error in the photometric redshift gives rise to an 
error in the estimated luminosity. Since measurement er- 
rors in the apparent magnitude also give rise to errors in 
the estimated luminosity, it is tempting to treat the photo- 
z errors similarly to how one treats the effects of errors in 
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the photometry. However, the two errors are not equivalent 
for the simple but important reason that the photo-z error, 
while affecting the estimated luminosity, leaves the observed 
apparent magnitude unchanged. In this respect, it is more 
accurate to view the photo-z error as equivalent to a pecu- 
liar velocity. This motivates reanalysis of relatively shallow 
galaxy surveys for which the peculiar velocity may be a sub- 
stantial fraction of the observed redshift, e.g. faint, nearby, 
low surface brightness galaxies, or galaxies in the 6dF survey 
(Jones et al. 2004). In this case, the error in the true dis- 
tance comes from the thickness of the Fundamental Plane, 
or the D n — a relation, and is typically on the order of twenty 
percent. 

The fractional error on the distances to most stars in our 
galaxy (those for which parallax measurements are not avail- 
able) is relatively large. Stobie, Ishada & Peacock (1989) 
discuss a method for estimating the luminosity function in 
the case of photometric parallaxes derived from the color- 
magnitude relation, but the approach is parametric (it re- 
quires an accurate guess of the intrinsic shape of the lu- 
minosity function), and it assumes that the distance errors 
are small. Our approach provides accurate non-parametric 
estimates which are valid even when the errors are large. 
We intend to apply our methods to provide non-parametric 
estimates of the stellar luminosity function which are not 
compromised by the noise in the distance estimator. 

Both the maximum likelihood and the V max estimators 
I derived assume that galaxies do not evolve, so one must 
break the sample up into narrow redshift bins before anal- 
ysis. This is risky in principle, because one wants a narrow 
bin in true redshift, but only photo-zs are available. In prac- 
tice, photo-zs are sufficiently accurate that a narrow bin 
in photo-z is still quite narrow in true-z. The maximum- 
likelihood and Vmax estimators of the luminosity function 
have another drawback: they ignore the fact that different 
galaxy types require different fc(z)-corrections, so one must 
preselect the sample to insure that it contains galaxies that 
are of the same type. Extending the analysis to allow for 
evolution and type is clearly desirable, and is the subject of 
work in progress. 
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